Skip to content

Conversation

@tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Oct 9, 2025

Motivation

Previously, e3sm_to_cmip always returned an exit code 0 even if multiple handlers failed.
This made it difficult for calling workflows or CI/CD pipelines to detect and respond to errors automatically.

This enhancement introduces explicit, user-controlled failure semantics while keeping the default behavior unchanged for backward compatibility.

Overview

This PR improves how e3sm_to_cmip reports and responds to handler failures across all run modes. It introduces a new command-line flag, --on-var-failure, which allows users to control how e3sm_to_cmip behaves when one or more variable handlers fail during CMORization or info-mode checks.

The new flag replaces the implicit always-succeed behavior with three explicit modes:

Option Description Exit Code
ignore (default) Continue processing regardless of any handler failures. 0
fail Process all handlers, but exit with code 1 if any fail. 1
stop Exit immediately when the first handler fails. 1

Comparison of before and after this PR:

Aspect Before this PR After this PR
Failure handling Always returned exit code 0, even if handlers failed. Exit code reflects failures based on --on-var-failure (ignore, fail, stop).
User control No way to stop or fail early — failures only logged. Users can choose: continue ("ignore"), exit after all failures ("fail"), or stop immediately ("stop").
Parallel mode behavior No exiting failure of first fail Gracefully cancels pending jobs but allows active ones to complete before exiting with --on-var-failure=stop
Info mode consistency Did not respect failure semantics. Fully honors --on-var-failure, logging and exiting consistently.
Exit codes Always 0. 0 (success), 1 (any failure, depending on mode).

Result: More predictable, script-friendly behavior, improved workflow integration, and safer, cleaner exits during parallel CMORization runs.

Closes #272

Details

  • Added a new CLI argument:
     --on-var-failure {ignore,fail,stop}
  • Updated _run_serial(), _run_parallel(), and _run_info_mode() to honor self.on_var_failure.
  • Introduced two helper methods for consistent behavior:
    • _handle_failed_handler() — logs, records, and optionally triggers immediate exit.
    • _finalize_failure_exit() — applies final exit logic at the end of processing.
  • Refactored shared failure logic to reduce duplication and improve testability.
  • Preserved all existing behaviors when --on-var-failure=ignore (default).
  • Maintained backward compatibility — no existing workflows are broken.

How Failures Are Handled with Parallel Jobs with stop

With --on-var-failure=stop, it gracefully cancels pending jobs but allows active ones to complete before exiting.

Compared to exiting immediately like with serial and info modes, the result is: Cleaner shutdowns, fewer partial outputs, and consistent logs and progress updates during parallel CMORization runs.

Implementation Notes

  • self.on_var_failure is now a class attribute shared across serial, parallel, and info modes.
  • Each handler failure is recorded in failed_handlers and processed through _handle_failed_handler().
  • _finalize_failure_exit() centralizes the exit decision logic.
  • All progress-bar and logging behavior remains unchanged.
  • _run_info_mode() now also respects --on-var-failure, treating missing variables or invalid table entries as handler failures.

Backward Compatibility

  • Default remains --on-var-failure=ignore.
  • No changes required for existing workflows or scripts.
  • Pipelines can now use non-zero exit codes for error handling if desired.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have added tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Oct 9, 2025

Integration Test Checklist

General Behavior

  • Verify that --on-var-failure appears in CLI help text with choices {ignore,fail,stop}.
  • Confirm default behavior is ignore when flag is omitted.
  • Ensure help text clearly describes each mode.

Serial Mode (_run_serial)

  • When all handlers succeed → exit code 0.
  • When one handler fails and --on-var-failure=ignore → continue, exit 0.
  • When one handler fails and --on-var-failure=fail → process remaining handlers, exit 1.
    • Missing handler
    • Non-derivable handler
    • CMORization error -- must manually introduce CMOR API error ad-hoc
  • When one handler fails and --on-var-failure=stop → stop immediately, exit 1.
    • Missing handler
    • Non-derivable handler
    • CMORization error -- must manually introduce CMOR API error ad-hoc
  • Confirm all successes/failures are logged correctly via _log_handler_status().
  • Ensure _log_final_result() still reports accurate counts (success vs failure).

Parallel Mode (_run_parallel)

  • When all handlers succeed → exit code 0.
  • When one handler fails and --on-var-failure=ignore → continue other futures, exit 0.
  • When one handler fails and --on-var-failure=fail → complete remaining futures, exit 1.
    • Missing handler
    • Non-derivable handler
    • CMORization error -- must manually introduce CMOR API error ad-hoc -- @TonyB9000 test this
  • When one handler fails and --on-var-failure=stop → cancel remaining futures, exit 1.
    • Missing handler
    • Non-derivable handler
    • CMORization error -- must manually introduce CMOR API error ad-hoc -- @TonyB9000 test this
  • Validate pool.shutdown(cancel_futures=True) is invoked on stop -- based on @TonyB9000 test
  • Verify log output matches serial mode semantics.
  • Ensure progress bar updates correctly even when some futures fail.

Info Mode (_run_info_mode)

  • When all variable checks pass → exit code 0.
  • When one variable is missing and --on-var-failure=ignore → continue checking, exit 0.
  • When one variable is missing and --on-var-failure=fail → process all, exit 1.
  • When one variable is missing and --on-var-failure=stop → stop checking immediately, exit 1.
  • Verify missing table entries are logged and treated as failures.
  • Validate YAML output is still written unless stopped early.
  • Confirm info.yaml and info_out_path outputs remain identical to prior behavior when successful.

Logging & Exit Behavior

  • Check that all handler failures log the handler name and reason.
  • Verify “Stopping immediately due to --on-var-failure=stop” appears in logs for early exit.
  • Verify “Exiting with code 1 (--on-var-failure=fail)” appears in logs when applicable.
  • Confirm sys.exit(1) is invoked only when appropriate (not during ignore mode).
  • Ensure no unhandled exceptions escape the process.

Regression & Compatibility

  • Run existing E3SM-to-CMIP regression tests to confirm no behavior changes when flag is omitted.
  • Verify old CLI invocations without --on-var-failure still return 0 even with failures.
  • Validate that error handling works consistently for MPAS and non-MPAS realms.
  • Confirm identical results whether run with or without progress bars (atm realm case).

@TonyB9000
Copy link
Contributor

@tomvothecoder Outstanding work, both for exit handling and for elucidating the "--info" mode options. (I can't imaging testing all combinations).

@tomvothecoder
Copy link
Collaborator Author

@tomvothecoder Outstanding work, both for exit handling and for elucidating the "--info" mode options.

I'm happy to get this going and hope it significantly improves the efficiency of debugging publication workflows!

(I can't imaging testing all combinations).

Thankfully in past refactoring efforts, I generated test cases with regridded data that I can re-use to test these cases. I'll post the test scripts above once I complete testing.

@TonyB9000
Copy link
Contributor

TonyB9000 commented Oct 9, 2025

I have v3 data where I know failures will occur (CFmon.clisccp "NaNs" issue is one, and piClim-control-iceini triggers CMOR failure due to bad user-metadata file). The former may fail in NCO phase - before e2c is called, however.

Previously, when info-mode failed, I was not catching that fact, and was passing an empty var-list to ncclimo, which responds by extracting ALL variables and attempting regrid on everything - not the best default in my view.

You can induce a metadata failure by introducing a term in the "activity_id" that is not in the current CV:

<   "activity_id": "RFMIP AerChemMIP",
---
>   "activity_id": "RFMIP",

@TonyB9000
Copy link
Contributor

@tomvothecoder The code looks very good! I downloaded the branch so I could follow the logic more completely. (Minor initial confusion on terms. I knew that a failed "info" mode would indicate a "handler" failure (failure to resolve a handler), but did not think of a runtime CMOR failure (perhaps bad data) as being a "failed handler", but I get it now - all failures pass through "finalize_failure_exit".

I will create a dev env to install and test this behavior (both info mode, and runtime CMOR errors)

@tomvothecoder
Copy link
Collaborator Author

@tomvothecoder The code looks very good! I downloaded the branch so I could follow the logic more completely. (Minor initial confusion on terms. I knew that a failed "info" mode would indicate a "handler" failure (failure to resolve a handler), but did not think of a runtime CMOR failure (perhaps bad data) as being a "failed handler", but I get it now - all failures pass through "finalize_failure_exit".

I will create a dev env to install and test this behavior (both info mode, and runtime CMOR errors)

@TonyB9000 Heads up, I'm still working on the code to ensure the implementation is complete and correct. I'll tag you again as needed. Thanks for being eager to test!

Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TonyB9000 In my most recent commit, 258794a (#323), I applied the behaviors of stop and fail on the initial process of deriving variable handlers.

These two cases will now end with sys.exit(1):

  1. Handler(s) is not defined for a variable (aka missing)
  2. Handler(s) is defined for a variable, but the input dataset(s) don't have the necessary raw E3SM variables.

I've highlighted the relevant code below in my review.

- Fix bug in `_get_handlers()` not instantianting `missing_handlers` after `_get_mpas_handlers()` call
- Add FIXME: comments for duplicate code
- Extract stop behaviors to `_stop_with_failed_handler()` and `_stop_with_failed_handler_parallel()`
@tomvothecoder tomvothecoder requested a review from Copilot October 10, 2025 22:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new --on-var-failure command-line flag to provide explicit control over how e3sm_to_cmip handles variable handler failures, replacing the previous always-succeed behavior with user-configurable exit modes.

Key changes:

  • Added --on-var-failure flag with three modes: ignore (default), fail, and stop
  • Updated all run modes (serial, parallel, info) to respect failure semantics and exit appropriately
  • Enhanced logging and error reporting to provide clearer feedback on handler failures

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
e3sm_to_cmip/argparser.py Adds the new --on-var-failure command-line argument
e3sm_to_cmip/runner.py Core logic updates for failure handling across all run modes
e3sm_to_cmip/util.py Adds convenience functions for consistent exit behavior
e3sm_to_cmip/cmor_handlers/utils.py Updates handler loading functions to return missing/non-derivable handlers
e3sm_to_cmip/cmor_handlers/handler.py Adds type alias for handler dictionaries
tests/cmor_handlers/test_utils.py Updates test assertions to handle new tuple return values
docs/source/usage.rst Documents the new flag and its behavior

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @TonyB9000, this PR is ready for your code review and testing.

I've tested most of these cases successfully. The only ones that need more thorough testing are for CMOR failures during "stop" mode with parallel processing. It should gracefully stop by allowing running jobs to complete and cancels pending jobs to ensure non-partial outputs, logs. Can you try to run this case? For example, run 2-3 variables and introduce faulty values for one of the variables that causes CMOR errors.

Comment on lines +173 to +183
"Variable Failure Behavior (--on-var-failure)": self.on_var_failure,
"Variable List (--var-list)": f"{self.var_list} ({len(self.var_list)})",
"Input Path (--input-path)": self.input_path,
"Output Path (--output-path)": self.output_path,
"Precheck Path (--precheck)": self.precheck_path,
"Log Path (--logdir)": self.log_path,
"CMOR Log Path (--logdir)": self.cmor_log_dir,
"CMIP Metadata Path (--user-metadata)": self.new_metadata_path,
"Temp Path for Processing MPAS Files": self.temp_path,
"Frequency": self.freq,
"Realm": self.realm,
"Frequency (--freq)": self.freq,
"Realm (--realm)": self.realm,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved log summary for e2c configuration, now includes the CLI arg if applicable.

if self.info_mode:
self._run_info_mode()
sys.exit(0)
exit_success()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced sys.exit(0) and sys.exit(1) with exit_success() and exit_failure(), respectively.

Comment on lines +513 to +556
def _log_handler_summary(self):
"""
Logs a summary of the derived CMOR handlers, including any missing or
non-derivable handlers.
"""
if self.handlers:
cmip_to_e3sm_vars = {
handler["name"]: handler["raw_variables"] for handler in self.handlers
}

logger.info("--------------------------------------")
logger.info("| SUCCESS: Derived Variable Handlers")
logger.info("--------------------------------------")
logger.info(f" * Count: {len(self.handlers)}")
logger.info(" * Variable Mappings (CMIP to E3SM):")
for k, v in cmip_to_e3sm_vars.items():
logger.info(f" * '{k}' -> {v}")

if self.missing_handlers:
logger.error("--------------------------------------")
logger.error("| NOTICE: Missing Handlers")
logger.error("---------------------------------------")
logger.error(
"Solution: Make sure handlers for these variables are defined "
"in `handlers.yaml`."
)
logger.error(f" * Count: {len(self.missing_handlers)}")
logger.error(f" * Variables: {self.missing_handlers}")

if self.non_derivable_handlers:
logger.error("--------------------------------------")
logger.error("| NOTICE: Non-derivable Handlers")
logger.error("---------------------------------------")
logger.error(
"Handlers were defined for these variables, but they could not "
"be derived using the input E3SM datasets."
)
logger.error(
"Possible Reasons: 1) No matching CMIP table was found for the "
"requested frequency or 2) The input E3SM datasets don't have "
"the required variables."
)
logger.error(f" * Count: {len(self.non_derivable_handlers)}")
logger.error(f" * Variables: {self.non_derivable_handlers}")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method logs the summary for handlers after attempting to derive them for each of the variables (--var-list).

Comment on lines +558 to +584
def _exit_due_to_handler_issues(self) -> bool:
"""
Determines if the program should exit due to missing or non-derivable
handlers based on the ``on_var_failure`` setting.
Returns
-------
bool
True if the program should exit, False otherwise.
"""
if not self.handlers:
logger.error(
"No variable handlers are defined or derivable from the raw "
"variables found in the E3SM input datasets."
)
return True

if self.missing_handlers or self.non_derivable_handlers:
if self.on_var_failure in ["stop", "fail"]:
logger.error(
"Exiting due to missing or non-derivable handlers with "
f"--on-var-failure={self.on_var_failure}."
)

return True

return False
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method determines whether to exit or not based on the status of handlers post-attempt at derivation. It also depends on --on-var-failure "stop" or "fail".

Comment on lines +633 to +634
# FIXME: This check is duplicated in mode 3 below. Refactor.
# --- DUPLICATE CODE ---
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments to remove duplicate code from a previous PR.

Comment on lines +836 to +838
if not is_cmor_successful:
self._stop_with_failed_handler(handler["name"])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop behavior for failed handler during cmorizing.

return False

self._log_final_result(num_handlers, num_success, failed_handlers)
self._finalize_on_failure(failed_handlers)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finalize on "fail" mode.

Comment on lines +934 to +937
if not future_result:
self._stop_with_failed_handler_parallel(
handler_name, pool, pbar, futures
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gracefully stop parallel jobs with "stop" mode.

pbar.close()
pool.shutdown()
self._log_final_result(num_handlers, num_success, failed_handlers)
self._finalize_on_failure(failed_handlers)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finalize on "fail" mode.

Comment on lines 1042 to 1087
def _log_final_result(
self, num_handlers: int, num_successes: int, failed_handlers: list[str]
):
"""
Logs the final result of the CMORization process.
Parameters
----------
num_handlers : int
The total number of handlers that were processed.
num_successes : int
The number of handlers that completed successfully.
failed_handlers : list[str]
A list of handler names that failed during processing.
"""
logger.info("========== FINAL RUN RESULTS ==========")
logger.info(f"* {num_successes} of {num_handlers} handlers succeeded.")
logger.info("")
logger.info("=======================================")
logger.info("| FINAL RUN SUMMARY")
logger.info("---------------------------------------")
logger.info(f" * Total variables (--var-list): {len(self.var_list)}")
logger.info(f" * Total handlers successfully derived: {num_handlers}")
logger.info(
f" * Total handlers successfully cmorized: {num_successes} / {num_handlers}"
)

if failed_handlers:
logger.error(
"* The following handlers failed: "
+ ", ".join(str(h) for h in failed_handlers)
f" * Total handlers failed to cmorize: {len(failed_handlers)}"
)
else:
logger.info("* All handlers completed successfully.")
logger.error(f" - Failed variables: {failed_handlers}")

if self.missing_handlers:
logger.error(
f" * Total handlers missing (not defined in handlers.yaml): "
f"{len(self.missing_handlers)}"
)
logger.error(f" - Includes: {self.missing_handlers}")

if self.non_derivable_handlers:
logger.error(
f" * Total handlers non-derivable (defined but not derivable): "
f"{len(self.non_derivable_handlers)}"
)
logger.error(f" - Includes: {self.non_derivable_handlers}")

logger.info("=======================================")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved final run summary formatting with helpful info on missing and non-derivable handlers.

@tomvothecoder tomvothecoder marked this pull request as ready for review October 10, 2025 22:29
@tomvothecoder tomvothecoder requested a review from Copilot October 10, 2025 22:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@TonyB9000
Copy link
Contributor

TonyB9000 commented Oct 10, 2025

@tomvothecoder I did some testing earlier and the new exit (fail) mode works great (both for info-mode and run-mode).

I will re-run these tests with the latest PR. I assume this gets me there:

(dsm_v3_gen) [ac.bartoletti1@chrlogin1 e3sm_to_cmip]$ git status
On branch 272-force-non-zero
Your branch is up to date with 'origin/enhancement/272-force-non-zero'.

nothing to commit, working tree clean
(dsm_v3_gen) [ac.bartoletti1@chrlogin1 e3sm_to_cmip]$ git pull
remote: Enumerating objects: 129, done.
remote: Counting objects: 100% (129/129), done.
remote: Compressing objects: 100% (47/47), done.
remote: Total 129 (delta 93), reused 116 (delta 82), pack-reused 0 (from 0)
Receiving objects: 100% (129/129), 38.07 KiB | 2.93 MiB/s, done.
Resolving deltas: 100% (93/93), completed with 13 local objects.
From https://github.com/E3SM-Project/e3sm_to_cmip
   258794a..bf9bf82  enhancement/272-force-non-zero -> origin/enhancement/272-force-non-zero
 * [new branch]      bump/v0.13.0                   -> origin/bump/v0.13.0
   7878b86..eb844fd  jinboxie_qboi                  -> origin/jinboxie_qboi
   19e3ee8..f90e4ab  master                         -> origin/master
 * [new branch]      preserve-legacy-xr-settings    -> origin/preserve-legacy-xr-settings
 * [new tag]         v1.13.0rc1                     -> v1.13.0rc1
Updating 258794a..bf9bf82
Fast-forward
 e3sm_to_cmip/cmor_handlers/handler.py |   7 ++++--
 e3sm_to_cmip/cmor_handlers/utils.py   | 101 +++++++++++++++++++++++++++++++++++++++++++------------------------------------
 e3sm_to_cmip/runner.py                | 353 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------------------------------------------------------------------------------------
 e3sm_to_cmip/util.py                  |  16 ++++++++-----
 tests/cmor_handlers/test_utils.py     |  37 ++++++++++++++++++++---------
 5 files changed, 293 insertions(+), 221 deletions(-)

pip install, etc ...

Successfully installed e3sm_to_cmip-1.13.0rc1

@TonyB9000
Copy link
Contributor

TonyB9000 commented Oct 10, 2025

@tomvothecoder There is an issue with info-mode - maybe not directly related to the exit handling.

I ran the command:

e3sm_to_cmip --info --on-var-failure fail -v pr --freq 3hr --realm atm -t /lcrc/group/e3sm2/DSM/Staging/Resource/cmor/cmip6-cmor-tables/Tables --map no_map --info-out 3hr_pr.yaml

the output to the console reads:

2025-10-10 18:09:51.861280 [INFO]: runner.py(__init__:159) >> --------------------------------------
2025-10-10 18:09:51.861473 [INFO]: runner.py(__init__:160) >> | E3SM to CMIP Configuration
2025-10-10 18:09:51.861542 [INFO]: runner.py(__init__:161) >> --------------------------------------
2025-10-10 18:09:51.866843 [INFO]: runner.py(__init__:187) >>   * Timestamp: 20251010_230951_858896
2025-10-10 18:09:51.866912 [INFO]: runner.py(__init__:187) >>   * Version Info: version 1.13.0rc1
2025-10-10 18:09:51.866957 [INFO]: runner.py(__init__:187) >>   * Mode: Info
2025-10-10 18:09:51.866998 [INFO]: runner.py(__init__:187) >>   * Variable Failure Behavior (--on-var-failure): fail
2025-10-10 18:09:51.867038 [INFO]: runner.py(__init__:187) >>   * Variable List (--var-list): ['pr'] (1)
2025-10-10 18:09:51.867077 [INFO]: runner.py(__init__:187) >>   * Input Path (--input-path): None
2025-10-10 18:09:51.867115 [INFO]: runner.py(__init__:187) >>   * Output Path (--output-path): /lcrc/group/e3sm2/DSM/Ops/DSM_Manager/e3sm_to_cmip_run_20251010_230951_858896
2025-10-10 18:09:51.867154 [INFO]: runner.py(__init__:187) >>   * Precheck Path (--precheck): None
2025-10-10 18:09:51.867192 [INFO]: runner.py(__init__:187) >>   * Log Path (--logdir): /lcrc/group/e3sm2/DSM/Ops/DSM_Manager/e3sm_to_cmip_run_20251010_230951_858896/20251010_230951_858896.log
2025-10-10 18:09:51.867236 [INFO]: runner.py(__init__:187) >>   * CMOR Log Path (--logdir): /lcrc/group/e3sm2/DSM/Ops/DSM_Manager/e3sm_to_cmip_run_20251010_230951_858896/cmor_logs
2025-10-10 18:09:51.867282 [INFO]: runner.py(__init__:187) >>   * CMIP Metadata Path (--user-metadata): /lcrc/group/e3sm2/DSM/Ops/DSM_Manager/e3sm_to_cmip_run_20251010_230951_858896/user_metadata_2743611.json
2025-10-10 18:09:51.867322 [INFO]: runner.py(__init__:187) >>   * Temp Path for Processing MPAS Files: None
2025-10-10 18:09:51.867360 [INFO]: runner.py(__init__:187) >>   * Frequency (--freq): 3hr
2025-10-10 18:09:51.867398 [INFO]: runner.py(__init__:187) >>   * Realm (--realm): atm
2025-10-10 18:09:52.128810 [INFO]: runner.py(_log_handler_summary:523) >> --------------------------------------
2025-10-10 18:09:52.128919 [INFO]: runner.py(_log_handler_summary:524) >> | SUCCESS: Derived Variable Handlers
2025-10-10 18:09:52.128969 [INFO]: runner.py(_log_handler_summary:525) >> --------------------------------------
2025-10-10 18:09:52.129019 [INFO]: runner.py(_log_handler_summary:526) >>   * Count: 2
2025-10-10 18:09:52.129066 [INFO]: runner.py(_log_handler_summary:527) >>   * Variable Mappings (CMIP to E3SM):
2025-10-10 18:09:52.129116 [INFO]: runner.py(_log_handler_summary:529) >>     * 'pr' -> ['PRECC', 'PRECL']
2025-10-10 18:09:52.180292 [INFO]: util.py(exit_success:80) >> Exiting with success code (0).

The output to the file 3hr_pr.yaml reads:

- CMIP6 Name: pr
  CMIP6 Table: CMIP6_day.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECT

Now, the "--help" never says that --realm or --freq do not apply to info mode (but they should, in any case).

(It turns out the for my "Bad_Metadata" test, the first dataset I tried was the 3hr.pr, or else I would have missed this.)

I'll conduct a test with a different variable - but now I think we need to test all "ambiguous var-name" sets (where multiple frequencies are involved).

@TonyB9000
Copy link
Contributor

@tomvothecoder Update. This may not be a real problem. There are only two handlers for "pr", mon.json and day,json. There is no 3hr.json. There should be for name-consistency. Currently the one labeled "Amon_day.json" is really "Amon_sub_mon.json".

I will continue to investigate (look for failures).

@tomvothecoder
Copy link
Collaborator Author

@tomvothecoder Update. This may not be a real problem. There are only two handlers for "pr", mon.json and day,json. There is no 3hr.json. There should be for name-consistency. Currently the one labeled "Amon_day.json" is really "Amon_sub_mon.json".

I will continue to investigate (look for failures).

Hey @TonyB9000, any updates on your review for this PR? If you approve, I will do a final self-review before merging.

@tomvothecoder tomvothecoder mentioned this pull request Oct 20, 2025
5 tasks
@tomvothecoder tomvothecoder mentioned this pull request Oct 30, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

[Feature]: Flag to force non-zero exit status if any variable failed.

3 participants